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Abstract 



Given a set of integers, one can easily construct the set of their pairwise distances. 
We consider the inverse problem: given a set of pairwise distances, find the integer 
set which realizes the pairwise distance set. This problem arises in a lot of fields in 
engineering and applied physics, and has confounded researchers for over 60 years. It is 
one of the few fundamental problems that are neither known to be NP-hard nor solvable 
by polynomial-time algorithms. Whether unique recovery is possible also remains an 
open question. 

In many practical applications where this problem occurs, the integer set is natu- 
rally sparse (i.e., the integers are sufficiently spaced), a property which has not been 
explored. In this work, we exploit the sparse nature of the integer set and develop 
a polynomial-time algorithm which provably recovers the set of integers (up to linear 
shift and reversal) from the set of their pairwise distances with arbitrarily high proba- 
bility if the sparsity is (^(n 1 / 2-6 ). Numerical simulations verify the effectiveness of the 
proposed algorithm. 

1 Introduction 

We consider the problem of reconstructing a set of integers from the set of their pairwise 
distances. For example, consider the set V = {2,5, 13,31,44}. Its pairwise distance set is 
given by W = {0, 3, 8, 11, 13, 18, 26, 29, 31, 39, 42}. We look at the problem of recovering the 
integer set V from the pairwise distance set Wf\ 




This recovery problem dates back to the origins of the classical phase retrieval problem 
in the 1930s [HE] and has received a lot of attention from researchers. More recently, it has 
arisen in computational biology, specifically in restriction site mapping of DNA [3j. This 
problem has also been posed as a computational geometry problem [I]. 

1"If V has a pairwise distance set W, then sets c ± V also have the same pairwise distance set W for any 
integer c. These solutions are considered equivalent, and in all the applications it is considered good enough 
if any equivalent solution, i.e., up to linear translation and flipping, is recovered. 
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1.1 Phase Retrieval 



Many measurement systems in practice can output only the squared-magnitude of the Fourier 
transform. Phase information is completely lost, because of which signal recovery is difficult. 
This is a fundamental problem in many fields, including optics [5], X-ray crystallography [6], 
astronomical imaging [7], speech processing [8], particle scattering, electron microscopy etc. 

Recovering a signal from its Fourier transform magnitude is known as phase retrieval. 
Since squared-magnitude of the Fourier transform and autocorrelation are Fourier pairs, the 
phase retrieval problem can be equivalently posed as recovering a signal from its autocorre- 
lation. 

Let x = {xq, £i, ....x n -i} be a discrete-time signal of length n and sparsity fc, where 
sparsity is defined as the number of non-zero elements. Its autocorrelation, denoted by 
a = {a , ai, ....a n _i}, is defined as 

a i d ~ x i x i+i = ( x * C 1 ) 

3 

where x is the time-reversed version of x. Also, let V and W denote the support set of the 
signal x and its autocorrelation a respectively, defined as 

V = {i\x l7 ^0} & W = {i\a l7 ^0} (2) 

The phase retrieval problem can be written as 

find x 

subject to x ★ x = a (3) 

Connection to integer recovery problem: It is often useful to be able to reconstruct 
the support set of the signal V from the support set of the autocorrelation W [13J. In 
many applications (e.g, astronomy), the signal's support set is the desired information. In 
other applications, support knowledge makes signal reconstruction process using available 
techniques significantly easier [2]. For example, in [21J we provide an algorithm which 
provably recovers sparse signals uniquely from their autocorrelation if the support set is 
known. 

We will assume that if = 0, then no two elements in x are separated by a distance z, 

i.e., 

en = => XjXi+j = V j (4) 

This is a very weak assumption and holds with probability one if the non-zero entries of the 
signal are chosen from a non-degenerate distribution. With this assumption, the support 
recovery problem can be posed as 

find V 

subject to {|i - j\ \ (ij) £ V} = W (5) 

Note that V is a set of integers, and W is exactly its pairwise distance set. 
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1.2 Computational Biology 

Over the last few years, there has been a lot of interest in DNA restriction site analysis. A 
DNA strand is a string on the letters {A, T, G, C}. Unfortunately, the DNA string cannot 
be explicitly observed and in order to map it, biochemical techniques which provide indirect 
information have been developed. 

When a particular restriction enzyme is added to a DNA solution, the DNA is cut at 
particular restriction sites. For example, the enzyme EcoRI cuts at locations of the pattern 
GAATTC. The goal of restriction site analysis is to determine the locations of every site 
for a given enzyme. In order to do this, a batch of DNA is exposed to a restriction enzyme 
in limited quantity so that fragments of all possible lengths exist (see Figure [T]) . Using gel 
electrophoresis, the fragment lengths can be measured. 

GAATTC GAATTC GAATTC GAATTC 



Figure 1: Partial Digest Problem 

Recovering the locations of the restriction sites from the measured fragment lengths is 
known as the partial digest problem (a.k.a turnpike problem, see [9j). The locations of the 
restriction sites correspond to the set of integers V ', and the measured fragment lengths 
correspond to the set of pairwise distances W. 

2 Contributions 

Researchers have proposed a wide range of heuristics jTQl [H] to solve the phase retrieval 
problem, a brief summary of which can be found in [12]. [15] provides a theoretical framework 
to understand the heuristics, which are in essence an alternating projection between a convex 
set and a non-convex set. The problem with such an approach is that convergence is often to a 
local minimum, hence chances of successful recovery are less. Also, no theoretical guarantees 
can be provided. 




Starting Point 



Figure 2: Alternating projection between a convex and a non-convex set 
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A variant of the partial digest problem, known as the turnpike problem, which is the 
problem of recovering an integer set from their pairwise distance multiset (multiplicity infor- 
mation of each pairwise distance also available) is also well studied. The most widely used 
algorithm to do this recovery is a worst-case exponential algorithm based on backtracking 
[T6] . [9] provides a comprehensive summary of the existing algorithms. The question of 
unique provable recovery using polynomial-time algorithms remains unanswered, and ad- 
vanced equipments have been used in practice to obtain additional information to solve the 
problem [TTt[T8]. 

In many applications of these problems, the underlying signals are naturally sparse. 
For example, astronomical imaging deals with the locations of the stars in the sky, X-ray 
crystallography deals with the density of atoms and so on. In DNA restriction site analysis, 
it is very reasonable to assume that the restriction sites are sparsely distributed. 

Recently, some attempts have been made to exploit sparsity. An alternating projection 
based heuristic was proposed in [19] , a semidefinite relaxation based heuristic was explored in 
[20] . In this work, we develop a probabilistic polynomial-time algorithm which can provably 
recover the underlying set uniquely if it is 0{n l / 2 ~ e ) sparse. 



Suppose V = {vo, vi, Vk-i} is a set of k integers and W = {wq^ w\, wk-i} is its 
pairwise distance setj^j 

Theorem 3.1 (Main Result). V can be recovered uniquely (upto linear shift and reversal) 
from W in polynomial-time with probability greater than 1 — 5 for any 5 > if 

(i) 3 n > wk-i such that V is chosen uniformly at randon^from {0, 1, — 1} 

(ii) k = Oin 1 / 2 - 6 ) 
(Hi) n > n(e, 5) 

In order to overcome the trivial ambiguity of linear shift and reversal, we attempt to 
recover the equivalent solution set U = {^o,^i, defined as follows 



i.e., the equivalent solution set U we attempt to recover has the following properties: 

(i) ^o = 

(ii) ui - u < u k _i - u k _ 2 

$The elements of V and W are assumed to be in ascending order without loss of generality for convenience 
of notation, i.e., vq < Vi < .... < v^-i and wq < w\ < .... < wk-i 

*Can use other distributions too. For example, each integer belongs to the set V with probability - 
independent of each other 



3 Main Result 




V - V if V 1 -V < V k -1 ~ Vk-2 

Vk-i — V otherwise 



(6) 
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4 Theory 



Let Uij — Uj — Ui for < i < j < k — 1. With this definition, W = {uij :0<i<j<k— 1} 
and U = {u 0j : < j < k - 1}. Note that t/ C W. 



4.1 Intersection Step 

The key idea of this step can be summarized as follows: suppose we know the value of u 0p 
for some p, if U p and W p are defined as 



U p = {u 0j : p < j < k - 1} & W p = W + u 0p 



then we have 



UpCwnWp 



(7) 



(8) 



The idea can be extended to multiple intersections. Suppose we know {uo ip : 1 < p < i}, 
we can construct {Wi p : 1 < p < t] and see that 



u it cwn (P\w ip 



(9) 



4.2 Graph Step 

For a given integer set Z = {z , z\, ....z\z\-i}, construct a graph G(Z) with \Z\ vertices such 
that there exists an edge between Z{ and Zj iff the following is satisfied 



Vz q , z h e Z, 



z h ^Zi- Zj unless (i,j) = (g,h) 



(10) 



i.e., there exists an edge between two vertices if their corresponding pairwise distance is 
unique. For example, consider the integer set Z = {2,5,19,53,67,84}. The graph G(Z) 
looks as shown in Figure [3| 




Figure 3: The graph G(Z) for Z = {2, 5, 19, 53, 67, 84} 
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There exists an edge between 2 and 5 as it is the only pair of integers with difference 
3, there doesn't exist an edge between 5 and 19 because there is another pair {53, 67} with 
difference 14 and so on. 

The main idea of this step is as follows: suppose we construct a graph G(Z) where 
U C Z C W. If there exists an edge between a pair of integers in {zi,Zj} G Z such that 
Zi — Zj G W ) then Zj} G U . This holds because if {z^, Zj} ^ U ) then since Z{ — Zj G W 
there has to be a pair of integers in U which have a pairwise distance zi — z^ which would 
contradict the fact that there is an edge between zi and Zj. 

5 Algorithm 



Algorithm 1 Integer Recovery Algorithm 
Input: Pairwise distance set W 
Output: Integer set U which realizes W 

1. Infer Uqi from W 

2. Construct the set W\ = W + u i 

lfk = 0{n l ' A - e ) 

3. Calculate U 1 = Wf)W 1 

4. Recover U = {0} U U x 

Else if k = 0{n l ' 2 - e ) 

5. Construct the graph G({0} U (W n Wi)) and infer {u 0ip :l<p<t = log(k)} 

6. Construct the set Wi p = W + ^oi p for 1 < p < t 




9. 



8. 



Define [7 = {£t 0j ...Uk-i} as f/ = — C7 and infer {i^: 1 < p < t} from C/ 2 - 
Construct the set W p = W + u 0p for 1 < p < t = log(k) 




11. 



Recover [7 = {^ 0p : < p < t - 1} U U t 



12. 



Recover U = Uk-i — U 



6 



Algorithm [T] outlines the steps needed to recover the set of integers from the set of their 
pairwise distances. Lemma 6.1 explains step 1. The proof for step 3 is given in Corollary 



6.1 The proof for Step 5 is given in Lemma 6J5. Lemma [6.7| provides the proofs for steps 7 
and 10. 



5.1 Circular Pairwise Distances (Beltway problem) 

In the phase retrieval setup, if the autocorrelation is defined circularly, i.e. 

def 



CLi 



(x*x)i 



(11) 



where the indices are considered modulo n, the integer recovery problem can be stated as 



find V 

subject to {(i - j)mod(n) | (ij) e V} = W (12) 

Algorithm [I] can be modified to solve this problem. Suppose = (uj — i^)mod(n) for 
< i^j < k — 1. The intersection step in this case can be modified as follows: suppose we 
know the value of u 0p for some if W p is defined as 

W p = (W + ^op)mod(n) (13) 

then we have 

ucwnw p (14) 

The idea can be extended to multiple intersections. Suppose we know {uo ip : 1 < p < t}, we 
can construct {Wi p : 1 < p < t] and see that 

ucwn(^)w^j (is) 

The graph step can be modified by having an edge between Zi and Zj iff the following is 
satisfied 

Vz g , z h G Z, (z g - z h )mod(n) ^ (z { - ^)mod(n) unless (ij) = (g, h) (16) 

Algorithm [2] outlines the steps needed to recover the set of integers from the set of their 
circular pairwise distances. 
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Algorithm 2 Integer Recovery Algorithm 
Input: Circular pairwise distance set W 
Output: Integer set U which realizes W 

1. Infer uqi from W 

2. Construct the set W\ = (W + 'Uoi)mod(n) 

3. Calculate W fl Wi, infer u 02 

4. Construct the set W 2 = (W + u 2)mod(n) 

5. Construct the graph G{W n W\ n W 2 ) and infer {^ ; p : 1 < p < t = log(k)} 

6. Construct the set Wi p = (W + UQ ip )mod(n) for 1 < p < t 



The proofs for all the steps are similar to Algorithm 1, except for steps 1 and 3. In step 
1, Uqi can be chosen as the minimum non-zero value in W without loss of generality. In step 
3, a choice has to be made between clockwise and anti-clockwise completion after selecting 
uqi j and can be done as follows: W fl W\ has terms {0, ^02} and {^01, ^21 = ^01 — ^02} with 
pairwise distance ^02 (distances considered modulo n). It can be seen that if k = 0(n 1 ^ 2 ~ e ) ) 
then Uq2 is the minimum number greater than uqi in W with this property with arbitrarily 
high probability. Hence, ^02 can be chosen by using the minimum value c in W greater than 
uqi j such that two pairs have a pairwise distance c, one of them being {0, c} and the other 
being {^01,^01 - c}. 

6 Proof of Main Theorem 

Suppose V is a fc-element subset of {0,1,...., n — 1} chosen uniformly at random, k = 
(^(n 1 / 2 " 6 ) and n > n(e,5). 

Lemma 6.1. uqi can be inferred from W . 

Proof. The first and second highest terms in W are given by uq^-i and Ui^-i respectively 
since uqi < Uk-2,k-i- Since uqi = uq^-i — ^i,fc-ij we can calculate uqi from W . □ 




Lemma 6.2. Probability that an integer I belongs to W is 



(1) 1 if I e U. 

(ii) less than or equal to ^ + o(^) if I £ U. 
Proof (i) Since QeU,ifleU then I e W. 
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(ii) If I £ U ) probability that an index I belongs to W can be bounded as follows 

Pr{l eW \1<£U} = Pr{{ \g,g + leV\l<£U}<— + o[ — ] (17) 

w n \ n J 

9 X 7 

□ 

Lemma 6.3. The probability that an integer I not in U belongs to W D W2 is less than or 
equal to j£ + o(£) 

Proof. 

Pr{l ewn W 2 \l £U} = Pr{Z, l-u 01 e W\l i U} (18) 
<Y,Pr{uoi = d] (^Pr{g,g + l,h,h + l-deV\(0,d)eU,(l,^d-lJ)^U}\ 

d \g,h J 

(19) 

*X>W = ->g + (20) 

□ 

Lemma 6.4. The probability that an index I in W , which is not in U , belongs to W H W2 is 
less than — + o(-) 

Proof. 

□ 

Lemma 6.5. Pr{u ot > Uk-k<*,k-i} < °~ f or an V a,8 > if t = log(k). 
Proof. Since the integers are chosen uniformly at random, we have 

tn 2tn 2 [2tn 2 \ 

E[u ot ] = J2 E h,i+i] < j & var[u ot ] = E[u 2 0t ] - E[u 0t } 2 < — + o I — ] (22) 

i=0 ^ ' 

Using Chebyshev's inequality, we get 

k a/2 n, J^ + {l^ 2t 2t\ 

Pr{u ot > —— } < ^ ^—s- < — + o — 23 

Similarly, we have 

i r , 2fc a n 2 [2k a n 2 \ . lS 
E[u k -k<* t k-i\ = ^ & var{it fc -fc",fc-i} < — p hoi fc2 1 (24) 



/0 2k a n 2 _\ _ / 2fe a n 2 i 



and 

Pr{ tIt _ t , t _ 1 <^}< (t? _ o( ^ < p; + • (25) 

Hence, we can bound the desired probability as follows 

2£ / 2£ \ 

Pr{^ot > Wjfe-jfe«,jfe-i} < + o ( — j < 5 (26) 

for any 5 > 0. □ 

Lemma 6.6. in £/ie grap/i G({0} U (W fl W2)); integers {u 0p : 1 < p < t = log(k)} have an 
edge with uq^-i w ith probability greater than 1 — 5 for any 5 > 0. 

Proof. For any fixed p such that 1 < p < t : note that terms u 0p and u ^-i have a pairwise 
distance u p ^-i- For there to be no edge between u 0p and uq^-Ii another integer pair should 
have the same pairwise distance. For this to happen, atleast one of the integers should be 
greater than u p ^-i- The only integers greater than u p ^-i in W can be {uij : < i < 
p — 1, j > i}. Hence, it is sufficient to show that none of these terms exist in W 2 with the 
desired probability. These terms can be split into two cases: 

(i) j <k- k a : 

Using Lemma |6.5[ we see that this event can be bounded with probability 5 for any 
S > 0. 

< u ot + u P j < Uk-k a ,k-i + v> P ,k-k a = u Pi k-i (27) 

(ii) k-k a < j <k 

There are k a such terms for a given p. If all the k a terms don't survive W D W23 we 
can be assured of an edge between u 0p and uq^-i- 

Hence, if a total of tk a terms in W don't survive W(l W<i, we are through. The probability 
that none of them survive W fl W2 can be upper bounded by tk a (^- + o(*-)) using union 
bounds and Lemma [674} This term can be made less than 8 for any 5 > 0. 

□ 

Lemma 6.7. The probability that an integer I not in U belongs to W fl (f]p=i Wi P ) ^ 5 ^ e55 
tfian or equal to (**i±f> )^/2 + (**±±f> )^/2 / or t = 

Proof. 

Pr{Z G W n (n< =1 W ip ) \l£U} = Pr{l, {I - u 0ip : 1 < p < t} G W\l £ U} (28) 
= Pr i u oi P = d p :l<p< t}Pr{l, {I -dp : 1 < p < t} G W\l £ U,u 0ip = d p : 1 < p < t} 

d±,..dt 

(29) 
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Using the proofs in [21J, we can show that the numbers {d p : 1 < p < t) have unique pairwise 
distances with probability 1 — 5 for any 5 > 0. Using this property, we see that at least \ft 
additional vertices in U (apart from {u^ : 1 < p < t}) and atmost 2t additional vertices are 
needed to realize the t pairwise distances. The probability can hence be bounded as 

2t /1X " ^ 2(1+C \vt/2 ^ 2(1+6) 



n ./2 < ( _ yyz + o( _ wt/a (go) 

n n 



--VI 

for * = /o#(/c). □ 



Lemma 6.8. Ui t = W fl ^Hp=i ™^ probability greater than 1 — 5 for any 5 > if 

t > log(k) 

Proof The expected number of terms in W fl ^Op=i no ^ ^ n ^ can be calculated using 
linearity property of expectation as 



/ / 1.2(1+6) \ / r 2(1+6) \ v^/2\ 

E[#i ewn (nl =1 W ip ) \l£U]<nl [—) + o [-^) I < * (31) 

for any 5 > 0. Using Markov inequality, we get 

Pr{#/ e w n (ri =1 W ip ) \1<£U>1}<5 (32) 
which completes the proof. □ 

Corollary 6.1. [7 2 = D W2 with probability greater than 1 — 8 for any S > if k = 

0{n l / A ~ e ), n > n(e,6). 

7 Numerical Simulations 

Consider the following example: 

W = {0, 2, 3, 5, 8, 12, 14, 17, 30, 33, 37, 38, 49, 51, 

52, 54, 57, 60, 68, 71, 76, 89, 90, 94, 97, 101, 
103, 106, 108, 109, 111, 114, 127, 128, 139, 
141, 144, 165, 177, 179, 182} 
We can infer u i = 182 - 179 = 3 from W. Construct W\ = W + «oi and calculate W fl W\. 

W n W± = {3, 5, 8, 17, 33, 52, 54, 57, 60, 71, 

97, 106, 109, 111, 114, 144, 182} 
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Construct G({0} U (W n W x )) to see that 

{0}o{5,17} 

from which we can infer u 02 5 and ^03 = 17. Construct W 2 — + ^02 and W 3 = W + ^03 
and calculate ^Hp=o ^p) 

^ = ^ 54 ' io6 ' in ' n4 ' i44 ' i82 ^ 

Calculate U = {u 0p : < p < 3} (J (f^o ^p) 

/7 = {0, 3, 5, 17, 54, 106, 111, 114, 144, 182} 

Simulations were performed by choosing k-element subsets V uniformly and randomly 
between {0, 1, — 1} for different values of n and k. Figure [i] plots the probability of 
successful recovery for n = 512 and n = 1024. 




60 
sparsity k 
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